<!DOCTYPE group PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<group>

%tableofcontents;

<p>The <a href="dox:sift;"><em>Scale-Invariant Feature Transform
(SIFT)</em></a> bundles a feature detector and a feature
descriptor. The detector extracts from an image a number of frames
(attributed regions) in a way which is consistent with (some)
variations of the illumination, viewpoint and other viewing
conditions. The descriptor associates to the regions a signature which
identifies their appearance compactly and robustly. For a more
in-depth description of the algorithm, see our
<a href="%pathto:root;api/sift_8h.html">API reference for SIFT</a>.</p>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.sift.extract">Extracting frames and descriptors</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>Both the detector and descriptor are accessible by
the <code>vl_sift</code> MATLAB command (there is a similar command
line utility). Open MATLAB and load a test image</p>

<precode type='matlab'>
I = vl_impattern('roofs1') ;
image(I) ;
</precode>

<div class="figure">
 <img src="%pathto:root;demo/sift_basic_0.jpg"/>
 <div class="caption">
  <span class="content">
   Input image.
  </span>
 </div>
</div>

<p>The <code>vl_sift</code> command requires a single precision gray
scale image. It also expects the range to be normalized in the [0,255]
interval (while this is not strictly required, the default values of
some internal thresholds are tuned for this case). The image
<code>I</code> is converted in the appropriate format by</p>

<precode type='matlab'>
I = single(rgb2gray(I)) ;
</precode>

<p>We compute the SIFT frames (keypoints) and descriptors by</p>

<precode type='matlab'>
[f,d] = vl_sift(I) ;
</precode>

<p>The matrix <code>f</code> has a column for each frame. A frame is a
disk of center <code>f(1:2)</code>, scale <code>f(3)</code> and
orientation <code>f(4)</code> . We visualize a random selection of 50
features by:</p>

<precode type="matlab">
perm = randperm(size(f,2)) ;
sel = perm(1:50) ;
h1 = vl_plotframe(f(:,sel)) ;
h2 = vl_plotframe(f(:,sel)) ;
set(h1,'color','k','linewidth',3) ;
set(h2,'color','y','linewidth',2) ;
</precode>

<div class="figure">
 <img src="%pathto:root;demo/sift_basic_2.jpg"/>
 <div class="caption">
  Some of the detected SIFT frames.
 </div>
</div>

<p>We can also overlay the descriptors by</p>

<precode type='matlab'>
h3 = vl_plotsiftdescriptor(d(:,sel),f(:,sel)) ;
set(h3,'color','g') ;
</precode>

<div class="figure">
 <img src="%pathto:root;demo/sift_basic_3.jpg"/>
 <div class="caption">
  A test image for the peak threshold parameter.
 </div>
</div>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.sift.match">Basic matching</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>SIFT descriptors are often used find similar regions in two
images. <code>vl_ubcmatch</code> implements a basic matching
algorithm. Let <code>Ia</code> and <code>Ib</code> be images of the
same object or scene. We extract and match the descriptors by:
</p>

<precode type='matlab'>
[fa, da] = vl_sift(Ia) ;
[fb, db] = vl_sift(Ib) ;
[matches, scores] = vl_ubcmatch(da, db) ;
</precode>

<div class="figure">
 <img src="%pathto:root;demo/sift_match_1.jpg"/>
 <img src="%pathto:root;demo/sift_match_2.jpg"/>
 <div class="caption">
  Top: A pair of images of the same scene. Bottom: Matching of SIFT
  descriptors with <code>vl_ubcmatch</code>.
 </div>
</div>

<p>
For each descriptor in <code>da</code>, <code>vl_ubcmatch</code> finds
the closest descriptor in <code>db</code> (as measured by the L2 norm
of the difference between them). The index of the original match and
the closest descriptor is stored in each column of
<code>matches</code> and the distance between the pair is stored in
<code>scores</code>.
</p>

<p>
Matches also can be filtered for uniqueness by passing a third
parameter to <code>vl_ubcmatch</code> which specifies a threshold.
Here, the uniqueness of a pair is measured as the ratio of the
distance between the best matching keypoint and the distance to the
second best one (see <code>vl_ubcmatch</code> for further details).
</p>


<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.sift.param">Detector parameters</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>The SIFT detector is controlled mainly by two parameters: the peak
 threshold and the (non) edge threshold.</p>


<p>The <em>peak threshold</em> filters peaks of the DoG scale space
  that are too small (in absolute value). For instance, consider a
  test image of 2D Gaussian blobs:</p>

<precode type='matlab'>
I = double(rand(100,500) &lt;= .005) ;
I = (ones(100,1) * linspace(0,1,500)) .* I ;
I(:,1) = 0 ; I(:,end) = 0 ;
I(1,:) = 0 ; I(end,:) = 0 ;
I = 2*pi*4^2 * vl_imsmooth(I,4) ;
I = single(255 * I) ;
</precode>

<div class="figure">
 <img src="%pathto:root;demo/sift_peak_0.jpg"/>
  <div class="caption">
   A test image for the peak threshold parameter.
  </div>
</div>

<p>We run the detector with peak threshold <code>peak_thresh</code> by</p>

<precode type='matlab'>
f = vl_sift(I, 'PeakThresh', peak_thresh) ;
</precode>

<p>obtaining fewer features as <code>peak_thresh</code> is increased.</p>

<div class="figure">
 <img src="%pathto:root;demo/sift_peak_1.jpg"/>
 <img src="%pathto:root;demo/sift_peak_2.jpg"/>
 <img src="%pathto:root;demo/sift_peak_3.jpg"/>
 <img src="%pathto:root;demo/sift_peak_4.jpg"/>
 <div class="caption">
   <span class="content">
     Detected frames for increasing peak threshold.<br/>
     From top: <code>peak_thresh = {0, 10, 20, 30}</code>.
  </span>
 </div>
</div>

<p>The <em>edge threshold</em> eliminates peaks of
the DoG scale space whose curvature is too small (such peaks yield
badly localized frames). For instance, consider the test image</p>

<precode type='matlab'>
I = zeros(100,500) ;
for i=[10 20 30 40 50 60 70 80 90]
I(50-round(i/3):50+round(i/3),i*5) = 1 ;
end
I = 2*pi*8^2 * vl_imsmooth(I,8) ;
I = single(255 * I) ;
</precode>

<div class="figure">
<img src="%pathto:root;demo/sift_edge_0.jpg"/>
<div class="caption">
<span class="content">
A test image for the edge threshold parameter.
</span>
</div>
</div>

<p>We run the detector with edge threshold <code>edge_thresh</code> by</p>

<precode type='matlab'>
f = vl_sift(I, 'edgethresh', edge_thresh) ;
</precode>

<p>obtaining more features as <code>edge_thresh</code> is increased:</p>

<div class="figure">
<img src="%pathto:root;demo/sift_edge_1.jpg"/>
<img src="%pathto:root;demo/sift_edge_2.jpg"/>
<img src="%pathto:root;demo/sift_edge_3.jpg"/>
<img src="%pathto:root;demo/sift_edge_4.jpg"/>
<div class="caption">
<span class="content">
  Detected frames for increasing edge threshold.<br/>
  From top: <code>edge_thresh = {3.5, 5, 7.5, 10}</code>
</span>
</div>
</div>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.sift.custom">Custom frames</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>The MATLAB command <code>vl_sift</code> (and the command line utility)
can bypass the detector and compute the descriptor on custom frames using
the <code>Frames</code> option.</p>

<p>For instance, we can compute the descriptor of a SIFT frame
centered at position <code>(100,100)</code>, of scale <code>10</code>
and orientation <code>-pi/8</code> by</p>

<precode type='matlab'>
fc = [100;100;10;-pi/8] ;
[f,d] = vl_sift(I,'frames',fc) ;
</precode>

<div class="figure">
<img src="%pathto:root;demo/sift_basic_4.jpg"/>
<div class="caption">
<span class="content">
  Custom frame at with fixed orientation.
</span>
</div>
</div>

 <p>Multiple frames <code>fc</code> may be specified as well. In this
  case they are re-ordered by increasing
  scale. The <code>Orientations</code> option instructs the program to
  use the custom position and scale but to compute the keypoint
  orientations, as in</p>

<precode type='matlab'>
fc = [100;100;10;0] ;
[f,d] = vl_sift(I,'frames',fc,'orientations') ;
</precode>

 <div class="figure">
<img src="%pathto:root;demo/sift_basic_5.jpg"/>
<div class="caption">
<span class="content">
Custom frame with computed orientations.
</span>
</div>
</div>

<p>Notice that, depending on the local appearance, a keypoint may
have <em>multiple</em> orientations.  Moreover, a keypoint computed on
a constant image region (such as a one pixel region) has no
orientations!</p>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.sift.conventions">Conventions</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>In our implementation SIFT frames are expressed in the standard
image reference.  The only difference between the command line and
MATLAB drivers is that the latter assumes that the image origin
(top-left corner) has coordinate (1,1) as opposed to (0,0). Lowe's
original implementation uses a different reference system, illustrated
next:</p>

<div class="figure">
<img src="%pathto:root;figures/sift-conv-vlfeat.png"/> <br/>
<img src="%pathto:root;figures/sift-conv.png"/>
<div class="caption">
<span class="content">
Our conventions (top) compared to Lowe's (bottom).
</span>
</div>
</div>

 <p>Our implementation uses the standard image reference system, with
  the <code>y</code> axis pointing downward. The frame
  orientation <code>&theta;</code> and descriptor use the same reference
  system (i.e. a small positive rotation of the <code>x</code> moves it
  towards the <code>y</code> axis). Recall that each descriptor element
  is a bin indexed by <code>(&theta;,x,y)</code>; the histogram is
  vectorized in such a way that <code>&theta;</code> is the fastest
  varying index and <code>y</code> the slowest.</p>

 <p>By comparison, D. Lowe's implementation (see bottom half of the
  figure) uses a slightly different convention: Frame centers are
  expressed relatively to the standard image reference system, but the
  frame orientation and the descriptor assume that the <em> y </em>
  axis points upward. Consequently, to map from our to D. Lowe's
  convention, frames orientations need to be negated and the descriptor
  elements must be re-arranged.</p>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.sift.ubc">Comparison with D. Lowe's SIFT</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>VLFeat SIFT implementation is largely compatible with
<a href="http://www.cs.ubc.ca/~lowe/keypoints/">UBC (D. Lowe's)
implementation</a> (note however that the keypoints are stored in a
slightly different format, see <code>vl_ubcread</code>). The following
figure compares SIFT keypoints computed by the VLFeat (blue) and UBC
(red) implementations.</p>

<div class="figure">
<img src="%pathto:root;demo/sift_vs_ubc_1.jpg"/>
<div class="caption">
<span class="content">
VLFeat keypoints (blue) superimposed to D. Lowe's keypoints
(red). Most keypoints match nearly exactly.
</span>
</div>
</div>

<p>The large majority of keypoints correspond nearly exactly. The
following figure shows the percentage of keypoints computed by the two
implementations whose center matches with a precision of at least 0.01
pixels and 0.05 pixels respectively.</p>

<div class="figure">
<img src="%pathto:root;demo/sift_vs_ubc_2.jpg"/>
<div class="caption">
<span class="content">
Percentage of keypoints obtained from the VLFeat and UBC implementations
that match up to 0.01 pixels and 0.05 pixels.
</span>
</div>
</div>

<p>Descriptors are also very similar. The following figure shows the
percentage of descriptors computed by the two implementations whose
distance is less than 5%, 10% and 20% of the average descriptor
distance.</p>

<div class="figure">
<img src="%pathto:root;demo/sift_vs_ubc_3.jpg"/>
<div class="caption">
<span class="content">
Percentage of descriptors obtained from the VLFeat and UBC
implementations whose distance is within 10% and 20% of the average
descriptor distance.
</span>
</div>
</div>

</group>
